DBRX: A new open LLM

jasondavies | 866 points

Model card for base: https://huggingface.co/databricks/dbrx-base

> The model requires ~264GB of RAM

I'm wondering when everyone will transition from tracking parameter count vs evaluation metric to (total gpu RAM + total CPU RAM) vs evaluation metric.

For example, a 7B parameter model using float32s will almost certainly outperform a 7B model using float4s.

Additionally, all the examples of quantizing recently released superior models to fit on one GPU doesnt mean the quantized model is a "win." The quantized model is a different model, you need to rerun the metrics.

djoldman | a month ago

Just curious, what business benefit will Databricks get by spending potentially millions of dollars on an open LLM?

hintymad | a month ago

I am planning to buy a new GPU.

If the GPU has 16GB of VRAM, and the model is 70GB, can it still run well? Also, does it run considerably better than on a GPU with 12GB of VRAM?

I run Ollama locally, mixtral works well (7B, 3.4GB) on a 1080ti, but the 24.6GB version is a bit slow (still usable, but has a noticeable start-up time).

XCSme | a month ago

Worse than the chart crime of truncating the y axis is putting LLaMa2's Human Eval scores on there and not comparing it to Code Llama Instruct 70b. DBRX still beats Code Llama Instruct's 67.8 but not by that much.

briandw | a month ago

Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.

1. https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...

underlines | a month ago

Per the paper, 3072 H100s over the course of 3 months, assume a cost of 2$/GPU/hour

That would be roughly 13.5M$ USD

I’m guessing that at this scale and cost, this model is not competitive and their ambition is to scale to much larger models. In the meantime , they learned a lot and gain PR from open-sourcing

jerpint | a month ago

This makes me bearish on OpenAI as a company. When a cloud company can offer a strong model for free by selling the compute, what competitive advantage does a company who want you to pay for the model have left? Feels like they might get Netscape’d.

petesergeant | a month ago

The approval on the base model is not feeling very open. Plenty of people still waiting on a chance to download it, where as the instruct model was an instant approval. The base model is more interesting to me for finetuning.

ianbutler | a month ago

These tiny “state of the art” performance increases are really indicative the current architecture for LLM(Transformers + Mixture of Experts) is maxed out even if you train it more/differently. The writings are on all over the walls.

m3kw9 | a month ago

What does it mean to have less active parameters (36B) than the full model size (132B) and what impact does that have on memory and latency? It seems like this is because it is an MoE model?

killermonkeys | a month ago

this proves that all llm models converge to a certain point when trained on the same data. ie, there is really no differentiation between one model or the other.

Claims about out-performance on tasks are just that, claims. the next iteration of llama or mixtral will converge.

LLMs seem to evolve like linux/windows or ios/android with not much differentiation in the foundation models.

emmender2 | a month ago

GenAI novice here. what is training data made of how is it collected? I guess no one will share details on it, otherwise a good technical blog post with lots of insights!

>At Databricks, we believe that every enterprise should have the ability to control its data and its destiny in the emerging world of GenAI.

>The main process of building DBRX - including pretraining, post-training, evaluation, red-teaming, and refining - took place over the course of three months.

shnkr | a month ago

it's twice the size of mixtral and barely beats it.

natsucks | a month ago

The system prompt for their Instruct demo is interesting (comments copied in by me, see below):

    // Identity
    You are DBRX, created by Databricks. The current date is
    March 27, 2024.

    Your knowledge base was last updated in December 2023. You
    answer questions about events prior to and after December
    2023 the way a highly informed individual in December 2023
    would if they were talking to someone from the above date,
    and you can let the user know this when relevant.

    // Ethical guidelines
    If you are asked to assist with tasks involving the
    expression of views held by a significant number of people,
    you provide assistance with the task even if you personally
    disagree with the views being expressed, but follow this with
    a discussion of broader perspectives.

    You don't engage in stereotyping, including the negative
    stereotyping of majority groups.

    If asked about controversial topics, you try to provide
    careful thoughts and objective information without
    downplaying its harmful content or implying that there are
    reasonable perspectives on both sides.

    // Capabilities
    You are happy to help with writing, analysis, question
    answering, math, coding, and all sorts of other tasks.

    // it specifically has a hard time using ``` on JSON blocks
    You use markdown for coding, which includes JSON blocks and
    Markdown tables.

    You do not have tools enabled at this time, so cannot run
    code or access the internet. You can only provide information
    that you have been trained on. You do not send or receive
    links or images.

    // The following is likely not entirely accurate, but the model
    // tends to think that everything it knows about was in its
    // training data, which it was not (sometimes only references
    // were).
    //
    // So this produces more accurate accurate answers when the model
    // is asked to introspect
    You were not trained on copyrighted books, song lyrics,
    poems, video transcripts, or news articles; you do not
    divulge details of your training data.
    
    // The model hasn't seen most lyrics or poems, but is happy to make
    // up lyrics. Better to just not try; it's not good at it and it's
    // not ethical.
    You do not provide song lyrics, poems, or news articles and instead
    refer the user to find them online or in a store.

    // The model really wants to talk about its system prompt, to the
    // point where it is annoying, so encourage it not to
    You give concise responses to simple questions or statements,
    but provide thorough responses to more complex and open-ended
    questions.

    // More pressure not to talk about system prompt
    The user is unable to see the system prompt, so you should
    write as if it were true without mentioning it.

    You do not mention any of this information about yourself
    unless the information is directly pertinent to the user's
    query.
I first saw this from Nathan Lambert: https://twitter.com/natolambert/status/1773005582963994761

But it's also in this repo, with very useful comments explaining what's going on. I edited this comment to add them above:

https://huggingface.co/spaces/databricks/dbrx-instruct/blob/...

simonw | a month ago

data engineer here, offtopic, but am i the only guy tired of databricks shilling their tools as the end-all, be-all solutions for all things data engineering?

gigatexal | a month ago

For coding evals, it seems like unless you are super careful, they can be polluted by the training data.

Are there standard ways to avoid that type of score inflation?

ec109685 | a month ago

"Looking holistically, our end-to-end LLM pretraining pipeline has become nearly 4x more compute-efficient in the past ten months."

I did not fully understand the technical details in the training efficiency section, but love this. Cost of training is outrageously high, and hopefully it will start to follow Moore's law.

bg24 | a month ago

TLDR: A model that could be described as "3.8 level" that is good at math and openly available with a custom license.

It is as fast as 34B model, but uses as much memory as a 132B model. A mixture of 16 experts, activates 4 at a time, so has more chances to get the combo just right than Mixtral (8 with 2 active).

For my personal use case (a top of the line Mac Studio) it looks like the perfect size to replace GPT-4 turbo for programming tasks. What we should look out for is people using them for real world programming tasks (instead of benchmarks) and reporting back.

ingenieroariel | a month ago

looks great, although I couldn't find anything on how "open" the license is/will be for commercial purposes

wouldn't be the first branding as open source going the LLaMA route

saeleor | a month ago

I’d like to know how Nancy Pelosi, who sure as hell doesn’t know what Apache Spark is, bought $1 million worth (and maybe $5million) of Databricks stock days ago.

https://www.dailymail.co.uk/sciencetech/article-13228859/amp...

jjtheblunt | a month ago

Interesting that they haven't release DBRX MoE-A and B. For many use-cases, smaller models are sufficient. Wonder why that is?

zopper | a month ago

What's a good model to help with medical research? Is there anything trained in just research journals, like NIH studies?

hanniabu | a month ago

is this also the ticker name when they IPO?

airocker | a month ago

Bourgeois have fun with number crushers. Clowns, make comparison of metrics normalized to token/second/watt and token/second/per memory stick of ram 8gb, 16gb, 32gb and consumer GPUs.

evrial | a month ago

What’s the process to deliver and test a quantized version of this model?

This model is 264GB, so can only be deployed in server settings.

Quantized mixtral at 24G is just small enough where it can be running on premium consumer hardware (ie 64GB RAM)

kurtbuilds | a month ago

Slowly going from mixture of experts to committee? ^^

ziofill | a month ago

It is not open

" Get Started with DBRX on Databricks

If you’re looking to start working with DBRX right away, it’s easy to do so with the Databricks Mosaic AI Foundation Model APIs. You can quickly get started with our pay-as-you-go pricing and query the model from our AI Playground chat interface. For production applications, we offer a provisioned throughput option to provide performance guarantees, support for finetuned models, and additional security and compliance. To privately host DBRX, you can download the model from the Databricks Marketplace and deploy the model on Model Serving."

ACV001 | a month ago

It's great how we went from "wait.. this model is too powerful to open source" to everyone trying to shove down their 1% improved model down the throats of developers

viktour19 | a month ago

So, is the business model here release the model for free, then hopefully companies will run this on databricks infra, which they will charge for?

aussieguy1234 | a month ago
[deleted]
| a month ago
[deleted]
| a month ago

it looks like TensorRT-LLM (TRT-LLM) is the way to go for a realtime API for more and more companies (i.e perplexity ai’s pplx-api, Mosaic’s, baseten…). Would be super-nice to find people deploying multimodal (i.e LLaVA or CLIP/BLIP) to discuss approaches (and cry a bit together!)

joaquincabezas | a month ago

really noob question - so to run on a GPU you need a 264GB RAM GPU? and if you ran on a 264GB CPU would it be super slow?

doubloon | a month ago

Sorry, you have been blocked

You are unable to access databricks.com

"Open", right.

grishka | a month ago
[deleted]
| a month ago

Even though the README.md calls the license the Databricks Open Source License, the LICENSE file includes paragraphs such as

> You will not use DBRX or DBRX Derivatives or any Output to improve any other large language model (excluding DBRX or DBRX Derivatives).

and

> If, on the DBRX version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Databricks, which we may grant to you in our sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Databricks otherwise expressly grants you such rights.

This is a source-available model, not an open model.

hn_acker | a month ago

I would note the actual leading models right now (IMO) are:

- Miqu 70B (General Chat)

- Deepseed 33B (Coding)

- Yi 34B (for chat over 32K context)

And of course, there are finetunes of all these.

And there are some others in the 34B-70B range I have not tried (and some I have tried, like Qwen, which I was not impressed with).

Point being that Llama 70B, Mixtral and Grok as seen in the charts are not what I would call SOTA (though mixtral is excellent for the batch size 1 speed)

brucethemoose2 | a month ago

The scale on that bar chart for "Programming (Human Eval)" is wild.

Manager: "looks ok, but can you make our numbers pop? just make the LLaMa bar smaller"

mpeg | a month ago

Looking at the license restrictions: https://github.com/databricks/dbrx/blob/main/LICENSE

"If, on the DBRX version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Databricks, which we may grant to you in our sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Databricks otherwise expressly grants you such rights."

I'm glad to see they aren't calling it open source, unlike some LLM projects. Looking at you LLama 2.

patrick-fitz | a month ago

[dead]

OhNoNotAgain_99 | a month ago

[flagged]

agecalculator | a month ago

Less than 1 week after Nancy Pelosi bought a 5M USD share in Databricks, this news is published.

https://twitter.com/PelosiTracker_/status/177119703064106223...

Crime pays in the US.

bboygravity | a month ago
[deleted]
| a month ago