This document discusses managing unstructured JSON data in Oracle databases. It describes how a company initially stored JSON files in VARCHAR2 columns, but then the files grew larger than 4000 characters requiring a change to CLOB storage. This change caused issues until developers understood that CLOBs have different access, storage, and processing mechanisms compared to VARCHAR2. The document provides an overview of CLOB architecture including data access, internal storage, caching, logging, and indexing. It emphasizes that properly understanding CLOBs is important when storing and manipulating JSON data in Oracle databases.
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Managing Unstructured Data in JSON and LOBs
1. 1 of 55
Managing Unstructured Data:
LOBs in the World of JSON
Michael Rosenblum
www.dulcian.com
2. 2 of 55
Who Am I? – “Misha”
Oracle ACE
Co-author of 3 books
PL/SQL for Dummies
Expert PL/SQL Practices
Oracle PL/SQL Performance Tuning Tips & Techniques
Known for:
SQL and PL/SQL tuning
Complex functionality
Code generators
Repository-based development
3. 3 of 55
Once upon a time…
There was a company that…
Received JSON files from external sources
Stored them in the database as VARCHAR2(4000) column
Manipulated JSON locally
Relied on triggers to do the audit
And SUDDENLY…
5. 5 of 55
JSON files are now
BIGGER THAN 4000 CHARs
6. 6 of 55
Sad story begins
Somewhere in the dungeon in DBA land:
Let’s convert all storage to CLOB!
… well, it’s designed to store unlimited data volume, isn’t it?
Somewhere in the Never Never Land management office:
Let’s make all changes in 24 hours – or ELSE!!
… well, we are agile, aren’t we?
Somewhere in the castle in Developer land:
Let’s assign whoever available to make changes NOW!!!
… well, SQL and PL/SQL are easy, aren’t they?
7. 7 of 55
Sad Story continues…
“Whoever was available” was the Sorcerer’s Apprentice
… AKA Junior Developer
8. 8 of 55
… and continues…
First 24 hours:
PL/SQL is not a strongly typed language, so…
the same functions can use both VARCHAR2 and CLOB, so…
everything should work without changes , so I am …
… DONE!
9. 9 of 55
End of the World #1
Action:
Changes to storage are deployed to PROD / No code changes
Impact:
Major problems that bring the whole system to a standstill
10. 10 of 55
Rescue #1
Action:
Call a friend!
Suggestion from friend:
H-m-m-m, are you using concatenation? BAD IDEA!
11. 11 of 55
Rescue #1: Original
create or replace procedure p_concat is
v_cl clob;
begin
for c in (select object_name
from all_objects where rownum<=20000)
loop
v_cl:=v_cl||c.object_name;
end loop;
end;
12. 12 of 55
Rescue #1: Proposal
create or replace procedure p_append_buffer is
v_cl clob;
v_buffer_tx varchar2(32767);
procedure p_flush is
begin
dbms_lob.writeappend(v_cl,length(v_buffer_tx), v_buffer_tx);
v_buffer_tx:=null;
end;
procedure p_add (i_tx varchar2) is
begin
if length(i_tx)+length(v_buffer_tx)>32767 then
p_flush;
v_buffer_tx:=i_tx;
else
v_buffer_tx:=v_buffer_tx||i_tx;
end if;
end;
13. 13 of 55
Rescue #1: Proposal (continued)
begin
dbms_lob.createtemporary(v_cl,true,dbms_lob.call);
for c in (select object_name
from all_objects where rownum<=20000)
loop
p_add(c.object_name);
end loop;
p_flush;
end;
14. 14 of 55
Rescue #1: impact
SQL> exec runstats_pkg.rs_start;
SQL> exec p_concat;
SQL> exec runstats_pkg.rs_middle;
SQL> exec p_append_buffer;
SQL> exec runstats_pkg.rs_stop(100);
----------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- -------------------------------------- ------------ ------------ -----------
TIMER cpu time (hsecs) 214 100 -114
TIMER elapsed time (hsecs) 227 106 -121
STAT lob writes 10,912 22 -10,890
STAT db block changes 19,767 424 -19,343
STAT db block gets 56,985 1,141 -55,844
STAT session logical reads 119,538 52,796 -66,742
STAT logical read bytes from cache 979,255,296 432,504,832 -46,750,464
Way faster
and
cheaper!
15. 15 of 55
… and continues?
Next 24 hours:
All concatenation operations are quickly changed to
DBMS_LOB all over the system
… DONE???
16. 16 of 55
End of the World #2
Action:
Changes deployed to PROD with minimal testing
Impact:
Audit records are not generated!
17. 17 of 55
Rescue #2
Action:
Call a DIFFERENT friend!
Suggestion from that (different) friend:
Are you using DBMS_LOB.WHRITEAPPEND?? BAD IDEA!
18. 18 of 55
Rescue #2: Analysis
CREATE TABLE json_tab_cl
(id number primary key,
demo_cl CLOB,
CONSTRAINT json_tab_cl_chk CHECK (demo_cl IS JSON) );
CREATE OR REPLACE TRIGGER json_tab_cl_biud
BEFORE INSERT OR DELETE OR UPDATE ON json_tab_cl FOR EACH ROW
BEGIN
IF inserting THEN dbms_output.put_line
('Row-before-INSERT');
ELSIF updating THEN dbms_output.put_line
('Row-before-UPDATE');
ELSIF deleting THEN dbms_output.put_line
('Row-before-DELETE');
END IF;
END;
/
19. 19 of 55
Rescue #2: analysis (continued)
SQL> INSERT INTO trigger_tab(id,demo_cl) VALUES (1,empty_clob());
Row-before-INSERT
1 row created.
SQL> DECLARE
2 v_cl CLOB;
3 BEGIN
4 SELECT demo_cl
5 INTO v_cl
6 FROM json_tab_cl WHERE id = 1 FOR UPDATE;
7 dbms_lob.writeAppend(v_cl,1,'{"name":"Misha"}');
8 END;
9 /
SQL> SELECT id, demo_cl FROM trigger_tab;
ID DEMO_CL
---------- --------------------------------
1 {"name":"Misha"}
INSERT trigger
fired
UPDATE trigger
didn’t!
20. 20 of 55
NOT Using DBMS_LOB BAD IDEA
Using DBMS_LOB also BAD IDEA???
21. 21 of 55
The Moral of the Story
CLOBs are NOT just larger VARCHAR2:
Access mechanisms are different
Storage mechanisms are different
Processing mechanisms are different
Yes, you need to know about all of that, because…
25. 25 of 55
What about Oracle (0)?
20c and above – JSON datatype! But…
20c will not be release for OnPrem at all
21c has been just recently announced
Key point: “innovation releases” are fun to experiment,
but questionable as production environments
26. 26 of 55
What about Oracle (1)?
Production-ready (19c and below) - JSON still doesn’t
have its own STORAGE datatype – the only option is:
Store as Varchar2/CLOB/BLOB + CHECK constraint
Manually manage all storage properties
CREATE TABLE trigger_json_tab
(id number primary key,
demo_cl CLOB,
CONSTRAINT trg_json_tab_chk CHECK (demo_cl IS JSON)
);
27. 27 of 55
What about Oracle (2)?
JSON PL/SQL types (JSON_ELEMENT_T,
JSON_OBJECT_T etc) are about granular manipulation
in procedural code
SQL/JSON elements (JSON_VALUE, JSON_ARRAY
etc) are about efficient data extraction in SQL queries
28. 28 of 55
So?
If you store JSON data
and DON’T manipulate it within Oracle you need to
understand CLOB architecture
And DO manipulate it within Oracle you REALLY need
to understand CLOB architecture
30. 30 of 55
Data Access
Problem
How can you access gigabytes of data?
Solution
Separate LOB data from LOB locator
LOB locator – special logical entity that points to LOB data and allows
communication with it.
Types of LOB operations:
Copy semantics - Data alone is copied from source to destination and a new locator
is created for the new LOB.
Reference semantics - Only the locator is copied (without changing the underlying
data).
31. 31 of 55
Data Access
Declare
v_cl CLOB;
Begin
select a_cl
into v_cl
from t_table;
End;
Data
Locator
32. 32 of 55
LOB Data States
Variable/column in the row can be in a number of different
states:
Null – exists but is not initialized
Empty – exists and has locator that doesn't point to any data
Since you can only access LOBs via locators, you must first create
them.
In some cases, an initial NULL value is needed.
Populated – exists, has locator, and contains real data
33. 33 of 55
Internal LOB Data Storage
Persistent LOBs:
Represented as values in the table column
Participate in transactions (Commit, Rollback, generate logs)
Each LOB has its own storage structure separate from the table in
which it is located.
Temporary LOBs:
Created in temporary tablespace and released when no longer
needed
Created when LOB variable is instantiated
Become permanent when inserted into table
Variables cause IO!
34. 34 of 55
IO Proof (1)
exec runstats_pkg.rs_start;
declare
v_cl CLOB;
begin
for i in 1..100 loop
v_cl:=v_cl||lpad('A',32000,'A');
end loop;
end;
/
exec runstats_pkg.rs_middle; end;
declare
v_cl CLOB;
begin
dbms_lob.createTemporary(v_cl, true,dbms_lob.call);
for i in 1..100 loop
dbms_lob.writeappend(v_cl, 32000, lpad('A',32000,'A'));
end loop;
end;
/
exec runstats_pkg.rs_stop(0);
Load 3.2 MB
35. 35 of 55
IO Proof (2)
...
----------------------------------------------------------------------------------
2. Statistics report
----------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ------------------------------------- ------------ ------------ ------------
STAT consistent gets 91 99 8
STAT lob writes 92 100 8
STAT db block changes 1,803 1,963 160
STAT db block gets 4,872 5,304 432
STAT db block gets from cache 4,872 5,304 432
STAT session logical reads 4,963 5,403 440
Lots of activities!
36. 36 of 55
LOB Implementations
Contemporary implementation (SecureFiles)
Introduced in Oracle Database 11g
Optimized for significant data flow
Constantly being extended
Old implementation (BasicFile)
Backward compatible in all versions 11g+
Well-understood by DBAs, but more resource-heavy
37. 37 of 55
Retention
LOBs have a special way to manage UNDO
BasicFile
Disabled (Default) – only to support consistent reads
Enabled – apply the same UNDO_RETENTION parameter as the regular data
SecureFile
Auto (Default) – used only to support consistent reads
None – no UNDO is required
MAX <N> - keep up to N MB of UNDO
MIN <N> - guarantee up to N seconds of retention
38. 38 of 55
Navigation Using Indexes
Data is stored in chunks
Consist of one or more data blocks up to 32K
Each I/O operation works with one chunk.
SecureFiles: Chunks are dynamic and cannot be managed.
Chunks are navigated using indexes
Each LOB column is represented by 2 segments:
One for storing data; one to store the index
Each segment has the same storage properties as regular tables.
Some restrictions:
Cannot drop or rebuild LOB indexes
Cannot specify different properties for index and data segments
39. 39 of 55
LOB Operations
Require physical I/O
May have a high number of wait events
Nice cheat - storage “in row”
Enabled – data less than 4000 characters will look like VARCHAR2(4000)
Disabled – everything goes directly to LOB segment
40. 40 of 55
LOB Caching
Caching options:
NOCACHE (default) – OK for very large LOBs or those with infrequent access
CACHE – best option but requires a lot of read/write activity
CACHE READS – useful when creating LOB once and reading data from it
often.
NOCACHE implementation
BasicFile – Direct I/O. May cause major “hiccups” in the database
SecureFile – special Shared Pool area (SHARED_IO_POOL)
41. 41 of 55
LOB Logging
Logging options:
Logging
Nologging (BasicFiles) – the same meaning as for table
FILESYSTEM_LIKE_LOGGING (SecureFiles) –keep the catalogue of LOBs
without generating logs on the data itself
Faster initial recovery (table is accessible)
LOBs have to be reloaded.
CACHE always implies LOGGING.
By default, logging option is inherited from the table.
42. 42 of 55
SecureFile extras: Free
Fragment operations – direct modifications to LOBs
without reloading (32K limit)
dbms._lob.Fragment_Insert
dbms._lob.Fragment_Delete
dbms._lob.Fragment_Move
dbms._lob.Fragment_Replace
43. 43 of 55
SecureFile Extras: Options
Oracle Advanced Compression Option
De-duplication - keep only one copy of LOB if they match
exactly
Compression (High/Medium/Low) – trade-off between storage
and CPU
Oracle Advanced Security Option
Encryption – direct implementationTransparent Data
Encryption (TDE)
45. 45 of 55
Playground (1)
CREATE TABLE json_tab_vc2
(id number primary key,
demo_tx varchar2(4000),
CONSTRAINT json_tab_vc2_chk CHECK (demo_tx IS JSON)
);
insert into json_tab_vc2 values (1,'{"name":"Misha"}');
CREATE TABLE json_tab_cl
(id number primary key,
demo_cl CLOB,
CONSTRAINT json_tab_cl_chk CHECK (demo_cl IS JSON)
);
46. 46 of 55
Playground (2)
declare
v_cl CLOB;
procedure p_add (i_tx varchar2) is
begin
dbms_lob.writeAppend(v_cl,length(i_tx),i_tx);
end;
begin
dbms_lob.createTemporary(v_cl, true,dbms_lob.call);
p_add('{');
for i in 1..1000 loop
p_add('"tag'||i||'":"value'||i||'",');
end loop;
p_add('"name":"ClobDemo"');
p_add('}');
insert into trigger_json_tab values (100,v_cl);
end;
Above 4K!
47. 47 of 55
Single operation
exec runstats_pkg.rs_start;
select json_value(demo_cl,'$.name') from json_tab_cl;
exec runstats_pkg.rs_middle;
select json_value(demo_tx,'$.name') from json_tab_vc2;
exec runstats_pkg.rs_stop(1);
-------------------------------------------------------------------------------------
2. Statistics report
-------------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ---------------------------------------- ------------ ------------ ------------
STAT physical read IO requests 1 0 -1
STAT securefile direct read ops 1 0 -1
STAT physical reads direct (lob) 3 0 -3
STAT physical read bytes 24,576 0 -24,576
STAT securefile direct read bytes 24,576 0 -24,576
48. 48 of 55
Two operations
…
select json_value(demo_cl,'$.name') from json_tab_cl
where json_exists(demo_cl,'$.name' false on error);
…
select json_value(demo_tx,'$.name') from json_tab_vc2
where json_exists(demo_tx,'$.name' false on error);
…
-------------------------------------------------------------------------------------
2. Statistics report
-------------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ---------------------------------------- ------------ ------------ ------------
STAT physical read IO requests 2 0 -2
STAT securefile direct read ops 2 0 -2
STAT physical reads direct (lob) 6 0 -6
STAT physical read bytes 49,152 0 -49,152
STAT securefile direct read bytes 49,152 0 -49,152
IO impact
for every operator?
50. 50 of 55
Multi-column (2)
-------------------------------------------------------------------------
2. Statistics report
-------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ---------------------------- ------------ ------------ ------------
STAT physical read IO requests 1 0 -1
STAT securefile direct read ops 1 0 -1
STAT physical reads direct (lob) 3 0 -3
STAT physical read bytes 24,576 0 -24,576
STAT securefile direct read bytes 24,576 0 -24,576
Exactly as a single call
51. 51 of 55
JSON_VALUE vs JSON_TABLE
select name_tx, tag10_tx, tag100_tx
from json_tab_cl,
json_table(demo_cl,'$'
columns (name_tx varchar2(2000) path '$.name',
tag10_tx varchar2(2000) path '$.tag10',
tag100_tx varchar2(2000) path '$.tag100'
)
);
…
select json_value(demo_cl,'$.name'),
json_value(demo_cl,'$.tag10'),
json_value(demo_cl,'$.tag100')
from json_tab_cl;
-------------------------------------------------------------------------------------
Type Name Run1 Run2 Diff
----- ---------------------------------------- ------------ ------------ ------------
STAT physical read IO requests 1 1 0
STAT securefile direct read ops 1 1 0
STAT physical reads direct (lob) 3 3 0
STAT physical read bytes 24,576 24,576 0
STAT securefile direct read bytes 24,576 24,576 0
Multiple
JSON_VALUE calls
are combined!
52. 52 of 55
JSON Storage Best Practices
SecureFiles are strongly recommended
Compression can significantly decrease the database footprint
(If you can get that extra option)
Always keep CACHE enabled
But you can play with LOGGING parameter if there is a way to
quickly recover JSON from the external source.
Always have IN ROW enabled (small CLOBs would be
treated as VARCHAR2)
53. 53 of 55
JSON Manipulating Best Practices
PL/SQL JSON types are very resource-intensive:
May not be recommended if your resources are limited
… But they guarantee the data quality
Keep the number of operations to a minimum
… Especially for large JSON files (parsing is expensive!)
Manual changes to JSON
Keep CLOB operations to a minimum
Any CLOB operation on >4K of data is causing IO
Use buffers whenever possible
PL/SQL VARCHAR2 limit is 32K!
54. 54 of 55
Summary
JSON is the new norm
… and we need to be able to efficiently store/manipulate it
Oracle provides everything you need to do it
… as long as you read footprints
Newer versions of Oracle database will make JSON
operations easier
… but we are not there yet
55. 55 of 55
Contact Information
Michael Rosenblum – mrosenblum@dulcian.com
Dulcian, Inc. website - www.dulcian.com
Blog: wonderingmisha.blogspot.com
Available NOW:
Oracle PL/SQL Performance Tuning Tips & Techniques