Indexing HTML Attributes and Unique Indexes

# Randy Buchholz (6 years ago)

I've been working with Custom Elements and I'm writing a lot of code against tag attributes. In some cases, I want the attribute values to be unique on a page (like id). It got me wondering about how the engines handle attribute based searches, and if indexing (with unique/distinct options) would provide value. I also find myself writing a lot of boilerplate getters/setters for attributes in the elements. Attribute handling could be improved by adding some additional support with something like an attrib feature. This would be similar to get or set in use.

class MyElement extends HTMLElement{
    attrib myAttrib('my-attribute') index distinct;
}

This would create the attribute my-attribute on the tag and element, and also generate a getter and setter

    get myAttrib() { return this.getAttribute('my-attribute'); }
    set myAttrib(v) { this.setAttribute('my-attribute', v); }

The index flag it would tell the engine it should create a map/hash to improve search optimization for heavily searched attributes. The distinct flag would indicate that all values for that attribute within context (e.g., document) should be unique. This might be used primarily by IDE's to generate warnings.

# Andrea Giammarchi (6 years ago)

With Custom Elements you have attributeChangedCallback which reacts after observedAttributes returned attributes, and I believe you'd like to use that to create getters and setters out of the box.

I don't think DOM specific class fields/syntax will ever land in JS itself, but I can suggest you looking at most handy custom elements patterns in here: gist.github.com/WebReflection/ec9f6687842aa385477c4afca625bbf4

About being unique, you can always document.querySelector('[attribute="' + value +'"]') and, if not null, throw an error 'cause already live on the DOM.

However, IDs are the most "unique" thing you can have, even if 2 IDs with same content are still allowed love on the document.

If you look for an easy way to have unique IDs, remember you can start from let id = Math.random() and do ++id any other time to have a new, rarely clashing, unique name. Prefix it with the nodeName and see you've already all uniqueness you need for you custom elements, since you can't define two custom elements with the same name anyway (yet, unless scoped, but that's another story).

# Andrea Giammarchi (6 years ago)

live *

# Randy Buchholz (6 years ago)

Thanks for the link. My current approach is similar to what you and the article describe. Maybe it’s just the old DBA in me, but even when I narrow my parameters (node.querySelector(“[…]”)) it feels like I’m doing a lot of “full table scans” when I would want to index some of the “columns”. I’m sure the engines are pretty optimized for this though.

# Oriol _ (6 years ago)

About being unique, you can always document.querySelector('[attribute="' + value +'"]')

This code is vulnerable to CSS injection, input values shouldn't be inserted raw into queries! You can use CSS.escape to sanitize.

# guest271314 (6 years ago)

Thanks for the link. My current approach is similar to what you and the article describe. Maybe it’s just the old DBA in me, but even when I narrow my parameters (node.querySelector(“[…]”)) it feels like I’m doing a lot of “full table scans” when I would want to index some of the “columns”. I’m sure the engines are pretty optimized for this though.

What do you mean by "Unique Indexes" (specifically unique within the scope of an HTML document indexes of elements are already unique) and "full table scans" (relevant to CSS specificity; that is, what code are you using now that is not capable of selecting specific elements, and attribute values)? CSS selectors can select any element by a variety of attribute name and value combinators, including using data-* attributes and Microdata. It is the responsibility of the developer to create unique names and values for HTML elements - and to not create duplicate ids. Is the HTML being used dynamic or static? Whether the HTML is dynamic or static Map and WeakMap can be used for "unique" key-value pairs of HTML elements, and HTML element attributes and values.

# Randy Buchholz (6 years ago)

Full Table Scans and Unique indexes are database concepts (that's the DBA reference). When a database searches for a record based on a column value, it might look at every record in the table to find the matches - scan the entire (full) table, in the order the records were inserted or stored. To speed this up, we can create indexes on table columns or column groups. These are like ordered maps or hash tables. To find a record, more efficient searches can be done against the indexes to find the records. Indexes can also act as constraints. A "Unique Index" is a constraint that checks a table to see if a value exists before inserting it in the table and adding it to the index. Indexing has a trade-off. It slows inserting, but improves searching. While understanding that databases and browsers are worlds apart, a foundational part of database engines is searching, just like it is in DOM manipulation. Indexing can provide orders of magnitude performance improvements when reading/searching in databases. It seemed worth seeing if the concept translated across technologies.

Without any optimizations, an attribute search on a document would look at each node, and then at each attribute of the node to find a match - Full Table Scan. This makes searches very slow. At an absurd extreme, we could index everything, making startup very slow and eating memory, but making some searches very fast. The balanced approach is to implement "indexing" ourselves (using any of the mentioned approaches) to get the best level.

About the code/HTML, it is dynamic and real-time. It is loaded over WebSockets, and the elements are talking to the backend in real-time over the sockets. I'm using an original (Trygve Reenskaug) MVC approach. Essentially, each Web Component is an MVC component, with the HTML/elements and code accessed only through the controller. I am looking at the incoming code for cases where several searches ae being performed on the same attribute (or element). I give these a generated id, create indexes on them, and expose them as properties on the controller. The underlying framework uses a set of common attributes that are searched on a lot, but only for a small set of elements. These are also indexed. So at the cost of slower startup (offset to some degree by doing some of this in a Web Worker and/or server-side), I can read and write "Form Fields" quickly.

Many language features are implemented to wrap or optimize common or repetitive use cases, or to move code to a more efficient part of the architecture. Indexing can do both. Without doing things server-side or in Workers, the indexing consumes UI cycles. Adding an indexing "hint" could allow all or part of this code to be moved back into the "system" or C++ layer. (e.g., into querySelect internals supported by low-level map stores) Or to parsing (like I'm doing), taking some of the repetitive work off the UI and developers hands.

# guest271314 (6 years ago)

If the HTML elements have a unique id set there is not search to perform (document.getElementById("id")), correct?

Form fields can be created, set and changed using FormData objects without using HTML elements at all.

Still not gathering what is meant by unique indexes.

# Andrea Giammarchi (6 years ago)

it's meant there couldn't be two indexes with the same value, but even IDs can be duplicated on a page (it's not suggested or anything, but nothing prevents you to do that)

to be honest, since IDs already cover the whole story (IDs are accessible even via globalThis/window, no need to query the document) I guess this whole story is about having el.uid, as opposite of el.id, so that a uid cannot be duplicated (it throws if it is), and document.uid[unique-uid-value] would return, without querying, the live node (if any)

however, I think this whole discussion in here makes no sense, as JS itself has nothing to do with HTML 🤷‍♂️

# Randy Buchholz (6 years ago)

@Andrea Giammarchi<mailto:andrea.giammarchi at gmail.com>, While the connection is secondary, HTML often serves as the specification for the creation of JS objects. And while it could be considered a sub-set, JS is full of HTML related features - HTMLElement for one. Thing is, if you are programming in JS for browser applications, you’re dealing with HTML-adjacent JS at some point. What I’m trying to do, though, somewhat supports your point. I see a lot of higher-level code manipulating HTML tags, which feels really wrong. Even dealing with HTMLElement in higher-level code doesn’t seem to make a lot of sense. I’m trying to encapsulate the elements and tags, and move that point as far into the background as I can.

@guest271314<mailto:guest271314 at gmail.com>

If we think of Indexes as a type of key-value pairs, a “regular” Index allows duplicate keys, and a Unique Index requires unique keys. Indexes are always sorted on their keys. So in this case, when the index is built, it creates k-v pairs of attributeName-elementId, ordered by attributeName. To get all elements with a specific attribute, we just find the first one with that key, and keep reading -getElementbyId(elementId) - until the key changes.

You’re right about id. I’m converting generic, multi-instance template “tags” into elements with id’s, so I can access them directly without searching. Just using getElementById. The template as been “localized” per instance, and encapsulated behind a controller. I want to avoid dealing with HTML, and even more HTTP verb related things like Form and FormData and just deal with general JS objects, so I use Models instead of things like FormData.

So for example, the business perspective of a ”Person” has “Age” data. A page may display multiple people at once.

<custom-person id=’person1’>
<custom-person id=’person2’>

The goal is to get from the source tag to non-html/element related JS as soon as possible.

The template behind this might look something like

<framework-container>
   <input prop=’name’ />
    <input prop=`age` />
</framework-container>

When connectedCallback runs, it creates a View using the template

<sometag>
   <input id=’person1_name` />
   <input id=`person1_age` />
</sometag>

<sometag>
    <input id=’person2_name’ />
    <input id=’person2_age’ />
</sometag>

A Model

class person{
    name;
    age;
}

And a dynamically configured Controller and instance. A base Person class contains common functionality.

class Person1 extends Person{
    get name(){ return document.getElementById(‘person1_name’)
    …
    get model(){ return this.populateModel();}
}
self.controllers.add(new Person1());

Now I don’t need to deal with any HTML/element or “tag hierarchy” related JS. I pretty much abstract out the HTML and HTMLElement pieces when the Custom Element is initially loaded.

    const personAge = self.controllers.person1.age;

At a lower level, I can create attribute related properties using the dynamically assigned element id.

<input prop=’name’ attrib=’style.color’ />

This would end up creating a property or methods on the controller that allows me to not have to deal with styles and CSS classes directly, and even constrain values.

self.controllers.person1.name.color = “red”;

So the whole index thing started when I was loading/parsing dynamic html/JS code and searching for prop and attrib repeatedly. If I know I’m going to be searching on an attribute a lot, maybe I could give the parser/engine a hint it could use to optimize that search.

# Andrea Giammarchi (6 years ago)

while it could be considered a sub-set, JS is full of HTML related

features - HTMLElement for one.

HTMLElement is defined by the living DOM standard (WHATWG) html.spec.whatwg.org/multipage/dom.html#htmlelement

it has nothing to do with JS.

JS is a general purpose programming language that implements ECMAScript standard, which on the Web gets enriched with some functionality, while on NodeJS it gets enriched with some other (and indeed HTMLELement doesn't exist there).

In GJS (Dekstop UI) it has other features too, so asking in a JS related mailing list to bring in something strictly DOM related (whatwg) is not appropriate.

Historically speaking, the only thing that went in strictly DOM related where things like String.prototype.blink methods and others, but today JS is really not Web based anymore, even if Web is one of its primary goals (but then again, with WASM around, any programming language can target the Web, so you want this proposal to land in WHATWG, not here).

# Randy Buchholz (6 years ago)

Sorry. My confusion.

# guest271314 (6 years ago)

I want to avoid dealing with HTML

Using HTML is part of premise of the proposal, correct?

Am still not sure what the actual requirement is.

If the requirement is to prevent duplicate values being input by the user you can utilize pattern attribute of <input> with a RegExp which matches the current values of <input> elements, oninvalid and checkValidity() which will provide the functionality of the value attribute of <input> and <select> elements being unique as to a <form> element.

If there is no user input there should not be any issue creating unique key-value pairs using Map; WeakMap; Set, or other means.

and even more HTTP verb related things like Form and FormData and just deal with general JS objects, so I use Models instead of things likeFormData.

Am not certain what a "Model" is.

A FormData object can be serialized and represented in various manners; including as an array of JavaScript arrays of key-value pairs that can be adjusted to the exact keys and values required [...formData]; multipart/form-data; etc. An earlier post mentioned forms at

I can read and write "Form Fields" quickly

Is user input involved in the procedure relevant to "Indexing HTML Attributes and Unique Indexes"?

What are you trying to achieve that you are not able to with the current code?

What do you consider to be "general JS objects"?

# Isiah Meadows (6 years ago)

You'd have better luck asking for this feature in discourse.wicg.io. ES Discuss is about the JS language itself and the related ECMAScript spec, not the Web APIs that are implemented in most browsers, usually separately to the JS implementations themselves.


Isiah Meadows contact at isiahmeadows.com, www.isiahmeadows.com